Robust semantic text similarity using LSA, machine learning, and linguistic resources

نویسندگان

Abhay L. Kashyap

Lushan Han

Roberto Yus

Jennifer Sleeman

Taneeya Satyapanich

Sunil Gandhi

Timothy W. Finin

چکیده

Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of di↵erent lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM 2013 task on Semantic Textual Similarity, our best performing system ranked first among the 89 submitted runs. In the SemEval-2014 task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval2014 task on Cross–Level Semantic Similarity, we ranked first in Sentence–Phrase, Phrase–Word, and Word–Sense subtasks and second in the Paragraph–Sentence subtask.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

In this paper, we try to find empirically the optimal dimensionality in data-driven models, Latent Semantic Analysis (LSA) model and Probabilistic Latent Semantic Analysis (PLSA) model. These models are used for building linguistic semantic knowledge which could be used in estimating contextual semantic similarity for the target word selection in English-Korean machine translation. We also faci...

متن کامل

Statement for Irina Matveeva

My research interest is to improve natural language applications by developing efficient unsupervised and semi-supervised machine learning approaches. My approach is to design machine learning solutions tailored to specific natural language problems based on an in-depth analysis of their components. I believe that machine learning algorithms are most efficient for language applications if they ...

متن کامل

The Plots of Children and Machines: The Statistical and Symbolic Semantic Analysis of Narratives

This thesis presents a method of automatic plot analysis of narrative texts that uses both components of traditional symbolic analysis of natural language and statistical machine-learning. In particular, we are investigating the story rewriting task. In the story rewriting task, an exemplar story is read to the pupils and the pupils rewrite the story in StoryStation, which allows them to concen...

متن کامل

A Solution to Plato's Problem:

How do people know as much as they do with as little information as they get? The problem takes many forms; learning vocabulary from text is an especially dramatic and convenient case for research. A new general theory of acquired similarity and knowledge representation, Latent Semantic Analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguist...

متن کامل

Towards Deeper Understanding of the LSA Performance

The paper presents on-going work towards deeper understanding of the factors influencing the performance of the Latent Semantic Analysis (LSA). Unlike previous attempts that concentrate on problems such as matrix elements weighting, space dimensionality selection, similarity measure etc., we primarily study the impact of another, often neglected, but fundamental element of LSA (and of any text ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Language Resources and Evaluation

دوره 50 شماره

صفحات -

تاریخ انتشار 2016

Robust semantic text similarity using LSA, machine learning, and linguistic resources

نویسندگان

چکیده

منابع مشابه

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

Statement for Irina Matveeva

The Plots of Children and Machines: The Statistical and Symbolic Semantic Analysis of Narratives

A Solution to Plato's Problem:

Towards Deeper Understanding of the LSA Performance

عنوان ژورنال:

اشتراک گذاری